Part 1: Pre-processing and smoothing satellite image time series

Loading and pre-processing data

Load the packages:

if (!require("pacman")) install.packages("pacman")
pacman::p_load("tidyverse", "tsibble", "bfast", "data.table", "mgcv","forecast", 
               "anytime", "fabletools", "signal", "fable", "tibble",
               "sf", "sits", "gdalcubes", "terra")

Import data - raw MTCI values from Sentinel-2 acquired from GEE:

df = read.csv("d:/OGH2023_data/species_mtci.csv")
str(df)
## 'data.frame':    1446996 obs. of  3 variables:
##  $ system.index: chr  "20170329T095021_20170329T095024_T33UXR_00000000000000000042" "20170329T095021_20170329T095024_T33UXR_00000000000000000043" "20170329T095021_20170329T095024_T33UXR_00000000000000000044" "20170329T095021_20170329T095024_T33UXR_00000000000000000045" ...
##  $ MTCI        : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ species_cd  : chr  "GB" "GB" "GB" "GB" ...
summary(df)
##  system.index            MTCI          species_cd       
##  Length:1446996     Min.   :-8.2      Length:1446996    
##  Class :character   1st Qu.: 1.3      Class :character  
##  Mode  :character   Median : 3.1      Mode  :character  
##                     Mean   : 3.0                        
##                     3rd Qu.: 4.4                        
##                     Max.   :17.8                        
##                     NA's   :1430563

As you can see, there is a lot of NA values, also, if we plot index values there are a lot of outliers.

plot(df$MTCI)

Modify column with date and change columns names:

df$system.index = as.Date(df$system.index, format =  "%Y%m%d")
names(df) = c("date", "index", "type")

Filtering values, removing NA, mean of duplicates:

df_clean = df %>% 
  drop_na() %>%
  group_by(date, type) %>%
  summarise(index = mean(index)) %>% 
  ungroup() %>%
  dplyr::filter(index < 7 & index > 0 & date > "2018-03-05") 

ggplot(df_clean, aes(date, index, color = type))+
  geom_point()+
  theme_light()

Example analysis on hornbeam (GB) time series

df_gb = df_clean %>%
  dplyr::filter(type == "GB")
ggplot(df_gb, aes(date, index))+
  geom_line(color = "darkgreen", linewidth = 1)+
  theme_light()

Firstly, identify and replace outliers using simple linear interpolation:

df_gb$index_out = df_gb$index %>% 
  tsclean()

ggplot(df_gb)+
  geom_point(aes(date, index), color = "red")+
  geom_point(aes(date, index_out), color = "darkgreen")+
  theme_light()

Smoothing 1: Simple Moving Average

df_gb$avg3 = rollmean(df_gb$index_out ,3, fill = NA)
df_gb$avg10 = rollmean(df_gb$index_out ,10, fill = NA)

ggplot(df_gb)+
  geom_line(aes(date, index_out))+
  geom_line(aes(date, avg3), color = "blue", linewidth = 1.0)+
  geom_line(aes(date, avg10), color = "red", linewidth = 1.0)+
  theme_light()

Smoothing 2: Savitzky-Golay smoothing

We need to create regular (1-day) time series with NA for missing values.

df_bfast = bfastts(df_gb$index_out, df_gb$date, type = ("irregular"))
df_tibble = tibble(date = seq(as.Date(df_gb$date[1]), by = "day", 
                              length.out = length(df_bfast)), value = df_bfast) %>%
  as_tsibble(index = date) 

Interpolation of unknown values using ARIMA fitting (The ARIMA() model requires equal spacing between observations)

df_inter = df_tibble %>% 
  model(arima = ARIMA(value ~ -1 + pdq(0,1,0) + PDQ(0,0,0))) %>% 
  fabletools::interpolate(df_tibble)

Apply Savitzky-Golay filter. n is a filter length (odd number) - test different values.

df_inter$sg = df_inter$value %>%
  sgolayfilt(n = 71) 

Analyze the results on plot - line represent time series smoothed by S-G, red dots - original values and blue dots - interpolated values

df_inter$original = df_tibble$value
ggplot(df_inter)+
  geom_point(aes(date, value), color = "blue", alpha = 0.3)+
  geom_point(aes(date, original), color = "red", alpha = 0.4)+
  geom_line(aes(date, sg), linewidth = 1)+
  theme_light()

Smoothing 3: Generalized Additive Models (GAM) fitting

The first step is the same as in Savitzky-Golay - we will create regular time series with NA values:

df_bfast = bfastts(df_gb$index_out, df_gb$date, type = ("irregular"))
df_tibble = tibble(date = seq(as.Date(df_gb$date[1]), by = "day", 
                              length.out = length(df_bfast)), value = df_bfast) %>%
  as_tsibble(index = date) %>%
  ts() %>% #but here also we need to change date format from y-m-d to 
  as.data.frame()

But now we can go straight to modelling without dealing with missing values!

GAM modelling with dates as predictor (Generalized Additive Mixed Models):

model = gamm(df_tibble$value ~ s(date, k = 60), 
             data = df_tibble, method = "REML")

Predicting values using GAM model and plotting the results:

df_tibble$predicted = predict.gam(model$gam, df_tibble)
df_tibble$date =  anydate(df_tibble$date)

ggplot(df_tibble)+
  geom_point(aes(date, value), color = "red", alpha = 0.4)+
  geom_line(aes(date, predicted), linewidth = 1)+
  theme_light()

Compare S-G (blue line) and GAM (black line):

df_tibble$sg = df_inter$sg

ggplot(df_tibble)+
  geom_point(aes(date, value), color = "red", alpha = 0.4)+
  geom_line(aes(date, predicted), alpha = 0.8, linewidth = 1)+
  geom_line(aes(date, sg), color = "blue", alpha = 0.8, linewidth = 1)+
  theme_light()

Part 3: SITS package

In this part, we will test some basics of SITS * package, which is for satellite image time series analysis. They are stored as earth observation data cubes. The book about SITS package can be found here

*Rolf Simoes, Gilberto Camara, Gilberto Queiroz, Felipe Souza, Pedro R. Andrade, Lorena Santos, Alexandre Carvalho, and Karine Ferreira. “Satellite Image Time Series Analysis for Big Earth Observation Data”. Remote Sensing, 13: 2428, 2021. doi:10.3390/rs13132428 Collections that can be used in SITS package are available in cloud services, such as Amazon Web Service or Microsoft’s Planetary Computer.

Firstly, we will try to find collection of Landsat imagery from MPC- you can specify your own region of interest (roi).

L8_cube = sits_cube(
  source = "MPC",
  collection = "LANDSAT-C2-L2",
  bands = c("RED", "SWIR16", "NIR08", "CLOUD"),
  roi = c("lat_min" = 50.026, "lon_min" = 19.902, "lat_max" = 50.027, "lon_max" = 19.903),
  start_date = "2021-06-01",
  end_date = "2022-08-01",
  prgress = TRUE)
View(L8_cube)
sits_timeline(L8_cube)

Select one tile with low cloud cover and plot it:

plot(L8_cube,
     red = "SWIR16", green = "NIR08", blue = "RED",
     date = "2021-06-18")

As the time series are irregular, they need to be converted to regular data cubes before further processing in sits. It may take some time…

reg_cube = sits_regularize(
  cube       = L8_cube,
  output_dir = tempdir(),
  res        = 120,
  period     = "P1M",
  multicores = 4)

And calculate NDVI:

reg_cube_ndvi = sits_apply(reg_cube,
                      NDVI = (NIR08 - RED) / (NIR08 + RED),
                      output_dir = tempdir(),
                      multicores = 4,
                      memsize = 12)

plot(reg_cube_ndvi, band = "NDVI", palette = "RdYlGn")